Kafka Architecture | Crisp & Clear-Datacloudy

In this blog we are going to see about what is Kafka and its architecture in crisp and clear manner. After going through this blog you will be getting the basic idea about Kafka. Kafka is nothing but the Distributed streaming Platform . It is known for its reliability, scalability and fault tolerance .

So in simple terms why we need Kafka, I will give you a scenario to understand this and it is one of the scenario to use Kafka. Lets assume we have a streaming job running 24*7 . Due to some reason it got failed and it took some time to recover. So mean time the data gets lost , to avoid that we can use Kafka.
So the Kafka is used to store the streaming of data in distributed manner and it uses the offset. Therefore using the offset the streaming job can continue to read the data after the recovery from where the data read is stopped due to the failure.
And another main scenario is , when many consumer want to consume data from a same source , like SSMS , the source cannot handle the load of access. In that scenario Kafka helps. The source will stream the data to Kafka and the consumers can connect to Kafka and consume the data.
We can set retention period to Kafka, the data which is present more than that retention period will be deleted. And do you know which company developed Kafka...its none other than LinkedIn. And one of the interesting fact is "The New York Time" Newspaper chose Kafka for storing the data and they choose the retention period to be 100 years. Its interesting right?..

Now Lets see about the Architecture of Kafka.

From the above architectural diagram, we can see that there are three main component and they are

Producer
Kafka Cluster
Consumer

Lets see about one by one in simple words,

Producer :

Producer are nothing but the resource that streams the data to the Kafka cluster.
It plays a vital role in the publish-subscribe model, producing data that can be consumed by one or more subscribers (consumers)
Producers use the Kafka Producer API to interact with the Kafka cluster, specifying topics, keys, and values for messages
Proper configuration and best practices ensure scalable, reliable, and efficient data streaming in Kafka-based applications.

Consumer :

Consumer are nothing but the resource that reads the data from the Kafka cluster.
It plays a crucial role in real-time data processing, allowing applications to consume and react to events as they occur
Consumers use the Kafka Consumer API, managing topics, partitions, and offsets for seamless data retrieval
Effective consumer configuration and error handling are essential for building responsive and fault-tolerant data streaming applications in Kafka.

Kafka Cluster:

Now here comes the main context of this blog, that is Kafka Cluster. Kafka cluster consists of many Kafka Brokers. Brokers are nothing but the machine or Node. Each brokers has many "Topic". The data that is streamed are stored in that topic in a append mode.

Note that Kafka will not update or process the data , it is as simple as append the data. And the data will be append in a sequential manner when there is a single partition per topic.

Data transfer :

The data are send and stored as bytes and Kafka will does not know about the datatype. that is it doesn't even know the details of the data whether it is string or integer.

Retention Period:

Based on the defined retention period , the data stored in the Kafka will be deleted. That is , incase if we decide it to be 7 days. the data present in the Kafka topic will be only available for 7 days.

Now lets brake down the Architecture in deeper,

We know that the Kafka cluster will have many Brokers. Now we need to know about a important topic called Replication . Kafka gives us a opportunity to replicate the data that we store. If it is 3 , the data stored in a broker , will be replicated to another 2 brokers. Replicate means we will be having exact copy of the data. The Zookeeper monitors the cluster. In case of any broker failure , the Zookeeper points the producer or the consumer to the next broker that has the exact replica of the data. Hence there is no breakage of the flow.

Now let us see in deeper of the broker.

We know that there will be many topics to a broker where the data are getting appended. To make this process to attain parallelism there is concept of partition, From the above diagram we can see that the Topic A has 4 partition.

When a Topic has no partition then the storage of data will be in sequence. But when we have partition there is no guarantee of the sequence of data between the partition , but the data will be in sequence within the partition.

So in this case we need a program in the consumer side to get the data in sequence , here we get a help that is the 'Key' . So there will be a key for the data and the data with the same key will go for specified partition only. So by this way and using this in the consumer side code we will achieve the sequencing in the partition .

Kafka can be used for a variety of use cases, including:

Log aggregation: It can be used to collect and aggregate logs from applications and systems.

Event streaming: It can be used to stream events from applications and systems.

Machine learning: It can be used to feed data to machine learning models.

IoT: It can be used to collect and process data from IoT devices.

There are many benefits to using Kafka, including:

Scalability: It can scale to handle millions of messages per second.

Reliability: It is highly reliable and fault-tolerant.

Durability: Kafka data is stored in a durable log that is replicated across multiple brokers.

Performance: Kafka is very fast and can handle high throughputs.

Ease of use: Kafka is easy to use and integrate with other systems.

Thus in this blog we saw the Kafka Architecture in crisp and clear manner. Along with that we also saw some of the use cases and the benefits of Kafka. Hope this blog will be helpful.

Thank you !!!

Ticker

Kafka Architecture | Crisp & Clear-Datacloudy

Post a Comment

0 Comments

Followers

Search This Blog

About Me

Labels

Popular Posts

Mount ADLS with Databricks| SAS token-Datacloudy

Databricks Certified Associate Developer for Apache Spark 3 Exam Preparation

REGEXP in SQL | Regular Expression with Example-Datacloudy

Footer Menu Widget

Ticker

Kafka Architecture | Crisp & Clear-Datacloudy

Post a Comment

0 Comments

Social Plugin

Followers

Search This Blog

Ad Code

About Me

Labels

Popular Posts

Mount ADLS with Databricks| SAS token-Datacloudy

Databricks Certified Associate Developer for Apache Spark 3 Exam Preparation

REGEXP in SQL | Regular Expression with Example-Datacloudy

Footer Menu Widget